scLANEIn this vignette we’ll focus on one of the more complicated features of scLANE: the usage of GEE models to identify dynamic genes when cells are grouped by some subject identity. The robust variance estimation available in GEE models allows us to make decisions about which genes are actually dynamic over pseudotime / latent time while accounting for intra-subject correlations between observations. The downsides are that the benefits of GEE models can be a little more difficult to explain to collaborators, and they take a little bit longer than GLMs to estimate. A refresher on what GEEs are and why they’re useful can be found here.
library(dplyr) # data manipulation
library(scran) # single cell tools
library(scLANE) # differential expression over pseudotime
library(scater) # single cell tools
library(slingshot) # pseudotime estimationFirst we’ll load in the lung tumor data from Zilionis et al (2019), and subset to only include tumor neutrophils. The authors found 5 continuous tumor neutrophil subsets in their analysis, and we’re interested in determining which genes drive subset-to-subset differentiation. The data were collected from 7 different patients, and thus the usage of GEE models is warranted in order to properly estimate within-patient gene expression correlations.
lung <- scRNAseq::ZilionisLungData(which = "human", filter = TRUE)
lung <- lung[, stringr::str_detect(lung$`Major cell type`, "tNeutrophils")]First we’ll preprocess the data in the typical way. We see that the neutrophils fall into 5 clusters, and that there is one odd outlier group of cells near the top of the UMAP plot. In addition, there is a decent amount of intra- and inter-patient variability. The cluster labels are not solely reflective of the cell subtypes, and seem to be driven by both patient identity as well as cell subtype.
lung <- logNormCounts(lung)
var_decomp <- modelGeneVar(lung)
top2k_hvgs <- getTopHVGs(var_decomp, n = 2000)
lung <- runPCA(lung, subset_row = top2k_hvgs)
reducedDim(lung, "PCAsub") <- reducedDim(lung, "PCA")[, 1:10, drop = FALSE]
lung <- runUMAP(lung, dimred = "PCAsub", n_dimred = 1:10)
g <- buildSNNGraph(lung, use.dimred = "PCAsub", k = 40)
clusters <- igraph::cluster_louvain(graph = g)$membership
colLabels(lung) <- factor(clusters)
plotUMAP(lung, colour_by = "label")plotUMAP(lung, colour_by = "Patient")plotUMAP(lung, colour_by = "Minor subset")We’ll use slingshot to estimate a pseudotime value for each cell. In this case, we see only one pseudotime lineage.
scl_lineage <- getLineages(reducedDim(lung, "PCAsub"), clusterLabels = clusters)
scl_crv <- getCurves(scl_lineage)
scl_cell_weights <- slingCurveWeights(scl_crv)
scl_pt <- slingPseudotime(scl_crv)
pt_df <- as.data.frame(scl_pt) %>%
mutate(across(everything(), function(x) x / max(x, na.rm = TRUE)))scLANEWe’re going to run scLANE solely on the top 10 most highly variable genes. This is for computational quickness, as otherwise the models would take a couple hours to run & this is a tutorial. Note that we need to provide a (sorted) vector of subject IDs as well as a working correlation structure. In this case we opt for the exchangeable (AKA compound symmetry) structure, as it stands to reason that within-subject cells will have similar correlations.
lung_counts <- t(lung@assays@data$counts)
lung_counts <- as.matrix(lung_counts[, which(colSums(lung_counts) > 0)])
gene_stats <- testDynamic(expr.mat = lung_counts[, colnames(lung_counts) %in% top2k_hvgs[1:10]],
genes = top2k_hvgs[1:10],
pt = pt_df,
is.gee = TRUE,
id.vec = lung$Patient,
cor.structure = "exchangeable",
parallel.exec = TRUE,
n.cores = 4,
track.time = TRUE)[1] "testDynamic evaluated 10 genes with 1 lineages apiece in 8.388 mins"
Let’s check out the results of the global differential expression test.
getResultsDE(gene_stats) %>%
select(-contains("LogLik"), -contains("Dev")) %>% # not relevant to GEEs
kableExtra::kbl(digits = 5,
align = "c",
booktabs = TRUE,
caption = "Lineage-level dynamic test",
col.names = c("Gene", "Lineage", "Test Stat.", "P-value", "Test Stat. Type",
"Model Status", "Adj. P-value", "Gene Dynamic over Lineage", "Gene Dynamic")) %>%
kableExtra::kable_paper("hover", full_width = FALSE)| Gene | Lineage | Test Stat. | P-value | Test Stat. Type | Model Status | Adj. P-value | Gene Dynamic over Lineage | Gene Dynamic |
|---|---|---|---|---|---|---|---|---|
| CXCL8 | A | 2102.214 | 0 | Wald | MARGE & null model OK | 0 | 1 | 1 |
| IL1RN | A | 1388.697 | 0 | Wald | MARGE & null model OK | 0 | 1 | 1 |
| IL1B | A | 643.460 | 0 | Wald | MARGE & null model OK | 0 | 1 | 1 |
| RGS2 | A | 20333.466 | 0 | Wald | MARGE & null model OK | 0 | 1 | 1 |
| S100A8 | A | 139.937 | 0 | Wald | MARGE & null model OK | 0 | 1 | 1 |
| HSPA1A | A | 11437468.161 | 0 | Wald | MARGE & null model OK | 0 | 1 | 1 |
| PI3 | A | 0.000 | 1 | Wald | MARGE & null model OK | 1 | 0 | 0 |
| NFKBIA | A | NA | NA | Wald | MARGE model error, null model OK | NA | 0 | 0 |
| ACTB | A | NA | NA | Wald | MARGE model error, null model OK | NA | 0 | 0 |
| CCL4 | A | NA | NA | Wald | MARGE model error, null model OK | NA | 0 | 0 |
Next we can run the slope test to identify over which pseudotime intervals gene expression changes significantly.
testSlope(test.dyn.results = gene_stats) %>%
kableExtra::kbl(digits = 5,
align = "c",
booktabs = TRUE,
caption = "Slope-level dynamic test",
col.names = c("Gene", "Lineage", "Breakpoint", "Rounded Breakpoint",
"Direction", "P-value", "Notes", "Adj. P-value",
"Gene Dynamic over Lineage-Slope", "Gene Dynamic over Lineage",
"Gene Dynamic")) %>%
kableExtra::kable_paper("hover", full_width = FALSE)| Gene | Lineage | Breakpoint | Rounded Breakpoint | Direction | P-value | Notes | Adj. P-value | Gene Dynamic over Lineage-Slope | Gene Dynamic over Lineage | Gene Dynamic |
|---|---|---|---|---|---|---|---|---|---|---|
| ACTB | 1 | NA | NA | NA | NA | MARGE model error, null model OK | NA | 0 | 0 | 0 |
| CCL4 | 1 | NA | NA | NA | NA | MARGE model error, null model OK | NA | 0 | 0 | 0 |
| CXCL8 | A | 0.49615 | 0.4962 | Right | 0.00000 | NA | 0.00001 | 1 | 1 | 1 |
| CXCL8 | A | 0.49805 | 0.4980 | Right | 0.00000 | NA | 0.00001 | 1 | 1 | 1 |
| HSPA1A | A | 0.00576 | 0.0058 | Right | 0.00000 | NA | 0.00000 | 1 | 1 | 1 |
| HSPA1A | A | 0.76835 | 0.7683 | Right | 0.00000 | NA | 0.00000 | 1 | 1 | 1 |
| IL1B | A | 0.27310 | 0.2731 | Right | 0.12353 | NA | 1.00000 | 0 | 0 | 0 |
| IL1B | A | 0.27429 | 0.2743 | Right | 0.13126 | NA | 1.00000 | 0 | 0 | 0 |
| IL1RN | A | 0.00576 | 0.0058 | Right | 0.00000 | NA | 0.00000 | 1 | 1 | 1 |
| IL1RN | A | 0.28645 | 0.2865 | Left | 0.00000 | NA | 0.00000 | 1 | 1 | 1 |
| NFKBIA | 1 | NA | NA | NA | NA | MARGE model error, null model OK | NA | 0 | 0 | 0 |
| PI3 | A | NA | NA | NA | NA | No non-intercept coefficients | NA | 0 | 0 | 0 |
| RGS2 | A | 0.00576 | 0.0058 | Right | 0.00000 | NA | 0.00000 | 1 | 1 | 1 |
| S100A8 | A | 0.18537 | 0.1854 | Left | 0.01513 | NA | 0.16639 | 0 | 1 | 1 |
| S100A8 | A | 0.18667 | 0.1867 | Right | 0.00000 | NA | 0.00000 | 1 | 1 | 1 |
Lastly, we can plot the fitted model for a couple of the genes.
plotModels(test.dyn.res = gene_stats,
gene = "CXCL8",
pt = pt_df,
gene.counts = lung_counts,
is.gee = TRUE,
id.vec = lung$Patient,
cor.structure = "exchangeable")plotModels(test.dyn.res = gene_stats,
gene = "RGS2",
pt = pt_df,
gene.counts = lung_counts,
is.gee = TRUE,
id.vec = lung$Patient,
cor.structure = "exchangeable")sessioninfo::session_info()─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.0.4 (2021-02-15)
os macOS Big Sur 10.16
system x86_64, darwin17.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2022-04-30
─ Packages ───────────────────────────────────────────────────────────────────
package * version date lib source
AnnotationDbi 1.52.0 2020-10-27 [1] Bioconductor
AnnotationFilter 1.14.0 2020-10-27 [1] Bioconductor
AnnotationHub 2.22.0 2020-10-27 [1] Bioconductor
ape 5.5 2021-04-25 [1] CRAN (R 4.0.2)
askpass 1.1 2019-01-13 [1] CRAN (R 4.0.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2)
beachmat 2.6.4 2020-12-20 [1] Bioconductor
beeswarm 0.3.1 2021-03-07 [1] CRAN (R 4.0.2)
bigassertr 0.1.5 2021-07-08 [1] CRAN (R 4.0.2)
bigparallelr 0.3.1 2021-02-02 [1] CRAN (R 4.0.2)
bigstatsr 1.5.1 2021-04-05 [1] CRAN (R 4.0.2)
Biobase * 2.50.0 2020-10-27 [1] Bioconductor
BiocFileCache 1.14.0 2020-10-27 [1] Bioconductor
BiocGenerics * 0.36.0 2020-10-27 [1] Bioconductor
BiocManager 1.30.12 2021-03-28 [1] CRAN (R 4.0.2)
BiocNeighbors 1.8.2 2020-12-07 [1] Bioconductor
BiocParallel 1.24.1 2020-11-06 [1] Bioconductor
BiocSingular 1.6.0 2020-10-27 [1] Bioconductor
BiocVersion 3.12.0 2020-05-14 [1] Bioconductor
biomaRt 2.46.3 2021-02-11 [1] Bioconductor
Biostrings 2.58.0 2020-10-27 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
bitops 1.0-6 2013-08-17 [1] CRAN (R 4.0.2)
blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.2)
bluster 1.0.0 2020-10-27 [1] Bioconductor
broom 0.7.12 2022-01-28 [1] CRAN (R 4.0.5)
bslib 0.2.4 2021-01-25 [1] CRAN (R 4.0.2)
cachem 1.0.4 2021-02-13 [1] CRAN (R 4.0.2)
cli 3.1.1 2022-01-20 [1] CRAN (R 4.0.5)
codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.4)
colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.2)
cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.0.2)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
curl 4.3 2019-12-02 [1] CRAN (R 4.0.1)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.0.2)
DelayedArray 0.16.3 2021-03-24 [1] Bioconductor
DelayedMatrixStats 1.12.3 2021-02-03 [1] Bioconductor
digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
doParallel 1.0.16 2020-10-16 [1] CRAN (R 4.0.2)
dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.0.2)
dqrng 0.2.1 2019-05-17 [1] CRAN (R 4.0.2)
edgeR 3.32.1 2021-01-14 [1] Bioconductor
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2)
ensembldb 2.14.0 2020-10-27 [1] Bioconductor
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
ExperimentHub 1.16.0 2020-10-27 [1] Bioconductor
fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.3)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2)
flock 0.7 2016-11-12 [1] CRAN (R 4.0.2)
FNN 1.1.3 2019-02-15 [1] CRAN (R 4.0.2)
foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.2)
gamlss 5.3-4 2021-03-31 [1] CRAN (R 4.0.2)
gamlss.data 6.0-1 2021-03-18 [1] CRAN (R 4.0.2)
gamlss.dist 5.3-2 2021-03-09 [1] CRAN (R 4.0.2)
geeM 0.10.1 2018-06-18 [1] CRAN (R 4.0.2)
generics 0.1.2 2022-01-31 [1] CRAN (R 4.0.5)
GenomeInfoDb * 1.26.7 2021-04-08 [1] Bioconductor
GenomeInfoDbData 1.2.4 2021-01-12 [1] Bioconductor
GenomicAlignments 1.26.0 2020-10-27 [1] Bioconductor
GenomicFeatures 1.42.3 2021-04-04 [1] Bioconductor
GenomicRanges * 1.42.0 2020-10-27 [1] Bioconductor
ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.2)
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.0.2)
glm2 1.2.1 2018-08-11 [1] CRAN (R 4.0.2)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.2)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.2)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
httpuv 1.5.5 2021-01-13 [1] CRAN (R 4.0.2)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
igraph 1.2.6 2020-10-06 [1] CRAN (R 4.0.2)
interactiveDisplayBase 1.28.0 2020-10-27 [1] Bioconductor
IRanges * 2.24.1 2020-12-12 [1] Bioconductor
irlba 2.3.3 2019-02-05 [1] CRAN (R 4.0.2)
iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.2)
jquerylib 0.1.3 2020-12-17 [1] CRAN (R 4.0.2)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.2)
kableExtra 1.3.4 2021-02-20 [1] CRAN (R 4.0.2)
knitr 1.37 2021-12-16 [1] CRAN (R 4.0.2)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.0.2)
later 1.1.0.1 2020-06-05 [1] CRAN (R 4.0.2)
lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.4)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.2)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
limma 3.46.0 2020-10-27 [1] Bioconductor
locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.2)
magrittr * 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
MASS 7.3-53.1 2021-02-12 [1] CRAN (R 4.0.2)
Matrix 1.3-2 2021-01-06 [1] CRAN (R 4.0.4)
MatrixGenerics * 1.2.1 2021-01-30 [1] Bioconductor
matrixStats * 0.58.0 2021-01-29 [1] CRAN (R 4.0.2)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.2)
mgcv 1.8-36 2021-06-01 [1] CRAN (R 4.0.2)
mime 0.10 2021-02-13 [1] CRAN (R 4.0.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
mvabund 4.1.12 2021-05-28 [1] CRAN (R 4.0.2)
nlme 3.1-152 2021-02-04 [1] CRAN (R 4.0.4)
openssl 1.4.3 2020-09-18 [1] CRAN (R 4.0.2)
pillar 1.6.5 2022-01-25 [1] CRAN (R 4.0.5)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
princurve * 2.1.6 2021-01-18 [1] CRAN (R 4.0.2)
progress 1.2.2 2019-05-16 [1] CRAN (R 4.0.2)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.0.2)
ProtGenerics 1.22.0 2020-10-27 [1] Bioconductor
ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.3)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.0.2)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.0.2)
RcppEigen 0.3.3.9.1 2020-12-17 [1] CRAN (R 4.0.2)
RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2)
rlang 1.0.0 2022-01-26 [1] CRAN (R 4.0.5)
rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.0.4)
Rsamtools 2.6.0 2020-10-27 [1] Bioconductor
RSpectra 0.16-0 2019-12-01 [1] CRAN (R 4.0.2)
RSQLite 2.2.6 2021-04-11 [1] CRAN (R 4.0.2)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
rsvd 1.0.3 2020-02-17 [1] CRAN (R 4.0.2)
rtracklayer 1.50.0 2020-10-27 [1] Bioconductor
rvest 1.0.0 2021-03-09 [1] CRAN (R 4.0.2)
S4Vectors * 0.28.1 2020-12-09 [1] Bioconductor
sass 0.3.1 2021-01-24 [1] CRAN (R 4.0.2)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
scater * 1.18.6 2021-02-26 [1] Bioconductor
scLANE * 0.3.1 2022-05-01 [1] local
scran * 1.18.7 2021-04-16 [1] Bioconductor
scRNAseq * 2.4.0 2020-11-09 [1] Bioconductor
scuttle 1.0.4 2020-12-17 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
shiny 1.6.0 2021-01-25 [1] CRAN (R 4.0.2)
SingleCellExperiment * 1.12.0 2020-10-27 [1] Bioconductor
slingshot * 1.8.0 2020-10-27 [1] Bioconductor
sparseMatrixStats 1.2.1 2021-02-02 [1] Bioconductor
statmod 1.4.35 2020-10-19 [1] CRAN (R 4.0.2)
stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
SummarizedExperiment * 1.20.0 2020-10-27 [1] Bioconductor
survival 3.2-10 2021-03-16 [1] CRAN (R 4.0.2)
svglite 2.0.0 2021-02-20 [1] CRAN (R 4.0.2)
systemfonts 1.0.1 2021-02-09 [1] CRAN (R 4.0.2)
tibble 3.1.6 2021-11-07 [1] CRAN (R 4.0.2)
tidyr 1.1.4 2021-09-27 [1] CRAN (R 4.0.2)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
tweedie 2.3.3 2021-01-20 [1] CRAN (R 4.0.2)
utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
uwot 0.1.10 2020-12-15 [1] CRAN (R 4.0.2)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2)
vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.2)
viridis 0.6.0 2021-04-15 [1] CRAN (R 4.0.4)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.0.2)
webshot 0.5.2 2019-11-22 [1] CRAN (R 4.0.2)
withr 2.4.3 2021-11-30 [1] CRAN (R 4.0.2)
xfun 0.29 2021-12-14 [1] CRAN (R 4.0.2)
XML 3.99-0.6 2021-03-16 [1] CRAN (R 4.0.2)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.2)
XVector 0.30.0 2020-10-28 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
zlibbioc 1.36.0 2020-10-28 [1] Bioconductor
[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library